This document provides a clear and simple guide to network detection and visualization in R. Below, I leverage the igraph, visNetwork, and DiagrammeR packages to effectively detect and visualize transaction networks.

The transaction data used in this example was generated using the randomNames package. The below code can be used to analyze almost any financial institution transaction datasets as the only variables used are originator, beneficiary, and transaction amount. More generally, the below code is quite useful when analyzing almost any dataset in which there is a sender and recipient. Other applications include email communication and social media activity.

Begin by loading the relevant packages and the transaction dataset.

library(dplyr)
library(igraph)
library(magrittr)
library(visNetwork)
library(DiagrammeR)
library(data.table)

mydata <- fread("~/Desktop/R/Data/Network Analysis Transaction Data.csv", header = T, stringsAsFactors=FALSE)

Network Detection

Next, create an igraph graph object.

graph <- graph.data.frame(mydata, directed=F)

Then simplifying the graph to remove loops and multiple edges. During the simplification step, I compute summary statistics for the combined edges by summing the edge weights (which corresponds to counterparty-pair total volume) and counterparty-pair total principal.

E(graph)$weight <- 1
graph <- simplify(graph, edge.attr.comb=list(weight = "sum", transaction_amount = "sum", function(x)length(x)))

Next, use the clusters command to calculate the connected components of the graph and assign the cluster id as a vertex attribute.

networks <- clusters(as.undirected(graph))
V(graph)$network <- networks$membership

Then convert the igraph graph into a data frame containing the vertex and cluster id and merge the data frame with the original data.

nodes <- get.data.frame(graph, what="vertices")
dt <- data.table(merge(mydata, nodes, by.x=c("originator"), by.y=c("name")))

The above steps enable me to easily and efficiently obtain node and edge attribute information when creating the visNetwork and DiagrammeR graph objects below.

Now for the graph.

Network Visualization

While network detection is quite simple, effective network visualization can be quite challenging when working with large transaction datasets in which there can be many individuals in a network and frequent, overlapping network connections. In addition, most analyses on the internet that explain how to visualize and analyze networks discuss more advanced network analysis techniques than one really requires when visualizing and analyzing a transaction dataset.

In addition, certain packages make it difficult to identify the individual nodes and connections, I have found that visNetwork and DiagrammeR are two of the most effective packages when one needs to effectively visualize and analyze transaction networks. Ultimately, the most useful type of network visual may depend on the size and structure of the network, as well as one’s goals.

visNetwork

The visNetwork package is truly a revelation. The package enables the user to easily and efficiently visualize networks as well as select nodes in order to highlight clients and their counterparties in the larger network graph.

Below, I create a visNetwork network graph.

nodes <- data.frame(id = nodes$name, title = nodes$name, group = nodes$network)
nodes <- nodes[order(nodes$id, decreasing = F),]

edges <- get.data.frame(graph, what="edges")[1:2]

visNetwork(nodes, edges) %>%
  visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)%>%
visGroups(groupname = "1", color = "maroon") 


Not only can the user select the relevant node using visNetwork’s nodesIdSelection command, she can also zoom in and out in order to explore different parts of the graph. Moreover, the layout of the visNetwork network graph is clean and clear compared to other interactive network graphs.

DiagrammeR

As noted above, the most useful type of network visualization and analysis may depend on the size and structure of the network, as well as one’s goals. While it is hard to beat visNetwork’s network visualization capabilities, DiagrammeR does enable the user to add originator-beneficiary pair aggregate transaction totals and other information to the network graph. In addition, DiagrammeR has additional features that enable one to isolate high-risk clients, perform network analysis, and work with network graphs. Moreover, DiagrammeR provides the user with an extraordinary amount of control over node and edge attributes. As is true with visNetwork, DiagrammeR makes it possible to visualize a ‘clean’ network and truly see the individual nodes and connections. See http://rich-iannone.github.io/DiagrammeR/graphs.html for additional details.

When using DiagrammeR, I personally prefer the following layouts:

  • Circo layout: Smaller network graphs with overlapping edges and complexity
  • Visnetwork: Small or Medium Interactive network graphs. Visnetwork enables the viewer to zoom in and out in order to explore different parts of the graph. Moreover, the layout of visnetwork network graph is clear and clean compared to other interactive network graphs.

Below I use DiagrammeR functions, strung together with the magrittr %>% pipe, to create and render network graphs that use the circo and visnetwork layouts. The create_graph function creates a dgr_graph object. The render_graph function - which enables the user to both visualize the network(s) and create ouput files - requires a dgr_graph object, created using the create_graph function.

Circo Layout

I prefer using DiagrammeR’s circo layout for smaller networks graphs with overlapping edges and complexity. To illustrate the circo layout, I will subset the smallest network in the transaction dataset.

dg <- decompose.graph(graph)
net_1 <- dg[which(networks$csize == min(networks$csize))][[1]]

Then, I create the nodes and edges. The type and rel attributes for nodes and edges, respectively, are optional but important for any data modelling work.

nodes_df1 <- create_nodes(nodes = unique(V(net_1)$name), type = "person", color = "gray")
edges_df1 <- create_edges(from = get.edgelist(net_1)[,1], 
                         to = get.edgelist(net_1)[,2], 
                         rel = E(net_1)$transaction_amount,
                         color = "maroon")
edges_df1$rel = as.numeric(edges_df1$rel)

Next, I create a network graph using the smallest network and the circo layout.

graph_attrs <- c("layout = circo")

graph1 <- create_graph(nodes_df = nodes_df1, edges_df = edges_df1, graph_attrs = graph_attrs) %>%
  set_graph_name("network1") %>%
  set_global_graph_attr("graph","output","circo")%>%
  set_global_graph_attr("graph", "principal", sum(edges_df1$rel))

graph1 %>% render_graph(width = 1000, height = 500)

%3 Ricardo Diaz Ricardo Diaz Monaco Rosario Marcos Monaco Rosario Marcos Ricardo Diaz->Monaco Rosario Marcos Christopher Hernandez Christopher Hernandez Ricardo Diaz->Christopher Hernandez Brando Contreras Brando Contreras Ricardo Diaz->Brando Contreras Shanice Vasquez Shanice Vasquez Shanice Vasquez->Monaco Rosario Marcos Shanice Vasquez->Christopher Hernandez Shanice Vasquez->Brando Contreras Humberto Zavala Castro Humberto Zavala Castro Monaco Rosario Marcos->Humberto Zavala Castro Emilia Hernandez-Alamo Emilia Hernandez-Alamo Monaco Rosario Marcos->Emilia Hernandez-Alamo Christopher Hernandez->Humberto Zavala Castro Christopher Hernandez->Emilia Hernandez-Alamo Brando Contreras->Humberto Zavala Castro Brando Contreras->Emilia Hernandez-Alamo Dayana Valdez Dayana Valdez Humberto Zavala Castro->Dayana Valdez Ricardo Najera Ricardo Najera Humberto Zavala Castro->Ricardo Najera Francisco Valdez Francisco Valdez Humberto Zavala Castro->Francisco Valdez Daniel Torres Daniel Torres Emilia Hernandez-Alamo->Daniel Torres Zack Gutierrez Zack Gutierrez Emilia Hernandez-Alamo->Zack Gutierrez Canyon Villanueva Canyon Villanueva Emilia Hernandez-Alamo->Canyon Villanueva

Visnetwork Layout

Below I create a DiagrammeR visnetwork network graph using the largest network in the transaction dataset.

net_2 <- dg[which(networks$csize == max(networks$csize))][[1]]
nodes_df2 <- create_nodes(nodes = unique(V(net_2)$name), type = "person", color = "gray")
edges_df2 <- create_edges(from = get.edgelist(net_2)[,1], 
                         to = get.edgelist(net_2)[,2], 
                         rel = E(net_2)$transaction_amount, 
                         color = "maroon")
edges_df2$rel = as.numeric(edges_df2$rel)

graph2 <- create_graph(nodes_df = nodes_df2, edges_df = edges_df2) %>%
  set_graph_name("network2") %>%
  set_global_graph_attr("graph","output","visNetwork")%>%
  set_global_graph_attr("graph", "principal", sum(edges_df2$rel)) 

graph2 %>% render_graph(width = 1000, height = 500)

DiagrammeR provides a convenient means to work with multiple graphs using a graph series object. The time and sequence properties of the series graphs can be used for subsetting. I will create an empty graph series object and then add each network graph to the series.

series <- create_series(series_type = "sequential", series_name = "series")

series <- graph1 %>% add_to_series(series)
series <- graph2 %>% add_to_series(series)

Community Detection and Visualization

If the network graph is extremely large, then it may be necessary to explore other options, such as graphing and analyzing communities (subnetworks) within the network rather than the entire network. Below I detect the communities within the larger networks, create a graph series of communities or subnetworks in the transaction dataset, then use DiagrammeR’s render_graph_from_series function to render the second community in the graph series.

First, I detect the communities using the fast and greedy community detection algorithm.

fc <- fastgreedy.community(graph)

Next, I create a network graph for each community and add each graph to a graph series object so that the graph series object will contain all community graphs.

community_series <- create_series(series_type = "sequential", series_name = "community_series")

for(g in unique(membership(fc))){
  subg<-induced.subgraph(graph, which((membership(fc)==g)  & ( sizes( fc)[[g]]!=1)))

  nodes_sub <- create_nodes(nodes = unique(V(subg)$name), type = "person", color = "gray")
  edges_sub <- create_edges(from = get.edgelist(subg)[,1], 
                         to = get.edgelist(subg)[,2], 
                         rel = E(subg)$transaction_amount, 
                         color = "maroon")
  
  graph_sub = create_graph(nodes_df = nodes_sub, edges_df = edges_sub) %>%
  set_graph_name(paste0("community_", toString(g)))%>%
  set_global_graph_attr("graph",
                        "output",
                        "visNetwork")
  
  community_series <- graph_sub %>% add_to_series(community_series)
  }

To render the second community in the graph series, I simply use DiagrammeR’s render_graph_from_series function and specify that I want to render the fourth sequential graph in the series, which I named ‘community_4’ in the loop above.

community_series$community_4 %>% render_graph_from_series(graph_series = community_series,
                         graph_no = 4)  

Network Analysis

While network visualization is extremely useful when analyzing transaction data, visuals are really just the tip of the iceburg. Several functions in the DiagrammeR and igraph packages allow the user to easily obtain general network graph information, as well as specific node and edge information about the current state of the graph object.

Inspecting the Graph

Upon identifying a large network with suspicious network structures in a transaciton dataset network graph, it is necessary to obtain basic network statistics, including node count and edge count. Below, I will examine network2 in the graph series object, which corresponds to the second network in the igraph network graph as I have not reordered the networks during the above network visualization.

Node Information

I begin analyzing the nodes by combining graph1 and graph2 to create a single graph object and analyze all nodes at once.

all_graphs <- combine_graphs(graph1, graph2)

DiagrammeR’s node_info function enables the user to obtain information about each node, including label, type, degree, indegree, outdegree, and loops.

node_info <- node_info(all_graphs)

Using node_info and dplyr::filter is an efficient way to determine which nodes are highly connected in this graph. In the context of transaction analysis, this enables the analyst to determine which clients have a large number of counterparties and isolate those clients who send and/or receive transactions from a large number of counterparties. The actual number of counterparties that could indicate suspicious activity or a need for futher investigation would depend on factors such as the transaction dataset start and end date.

First, I will identify highly connected nodes, or individuals who have over ten counterparties.

highly_connect <- filter(node_info(all_graphs), degree > 10)$node

Next, I will isolate those individuals in the transaction dataset who either send or receive transactions from over five counterparties.

high_indegree <- filter(node_info(all_graphs), indegree > 5)$node
high_outdegree <- filter(node_info(all_graphs), outdegree > 5)$node

To obtain all transaction counterparties in an igraph object, simply use the neighbors function.

neighbors <- V(graph)$name[neighbors(graph, 4)]